Unified Inter and Intra Options Learning Using Policy Gradient Methods

نویسندگان

Kfir Y. Levy

Nahum Shimkin

چکیده

Temporally extended actions (or macro-actions) have proven useful for speeding up planning and learning, adding robustness, and building prior knowledge into AI systems. The options framework, as introduced in Sutton, Precup and Singh (1999), provides a natural way to incorporate macro-actions into reinforcement learning. In the subgoals approach, learning is divided into two phases, first learning each option with a prescribed subgoal, and then learning to compose the learned options together. In this paper we offer a unified framework for concurrent interand intra-options learning. To that end, we propose a modular parameterization of intra-option policies together with option termination conditions and the option selection policy (inter options), and show that these three decision components may be viewed as a unified policy over an augmented state-action space, to which standard policy gradient algorithms may be applied. We identify the basis functions that apply to each of these decision components, and show that they possess a useful orthogonality property that allows to compute the natural gradient independently for each component. We further outline the extension of the suggested framework to several levels of options hierarchy, and conclude with a brief illustrative example.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Option-Critic Architecture

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new ...

متن کامل

Intra-Option Learning about Temporally Abstract Actions

Several researchers have proposed modeling temporally abstract actions in reinforcement learning by the combination of a policy and a termination condition, which we refer to as an ”option”. Value functions over options and models of options can be learned using methods designed for semi-Markov decision processes (SMDPs). However, these methods all require an option to be executed to terminatio...

متن کامل

An Inference-based Policy Gradient Method

In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it...

متن کامل

An Inference-based Policy Gradient Method

متن کامل

Policy Learning - A Unified Perspective with Applications in Robotics

Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper, we show two contributions: firstly, we show a unified perspective which allows us to derive several policy learning algorithms from a common point of view, i.e, policy gradient algorithms, naturalgradient algorithms ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Unified Inter and Intra Options Learning Using Policy Gradient Methods

نویسندگان

چکیده

منابع مشابه

The Option-Critic Architecture

Intra-Option Learning about Temporally Abstract Actions

An Inference-based Policy Gradient Method

An Inference-based Policy Gradient Method

Policy Learning - A Unified Perspective with Applications in Robotics

عنوان ژورنال:

اشتراک گذاری